Introduction

Background and Motivation

Jargon is an innovative Chrome extension (Chrome Web Store, Official Website) created by my friend that transforms English web content into learning opportunities using generative AI technology. Launched in June 2024, Jargon offers two types of learning experiences: foreign language learning (Spanish, Chinese, etc.) and English style adaptation (GRE vocabulary, TikTok slang, etc.).

How Jargon Works

Customization Options

Figure 1: User Settings Interface showing customization options

Key Features

Language Selection

All types, from foreign languages like Spanish and Chinese to English variations such as TikTok Slang

Learning Goals

• Difficulty: Easy-Hard (1-10)
• Daily Target: 10-100 questions

Question Density

Controls percentage of eligible sentences (0-100%) highlighted for practice on each webpage

Display Settings

• Text Style: Highlight or underline
• Site Controls: Enable/disable per website or temporarily

Text Selection Methods

Highlight Example

Figure 2a: Highlight Style - Text appears with background color emphasis

Underline Example

Figure 2b: Underline Style - Text appears with underline emphasis

Language Transformation Examples

Question Generation Process

Figure 3: Question Generation Process - Users select text from any webpage to create practice questions

GRE Question Answer

Figure 4a: GRE Mode - Advanced vocabulary transformation

TikTok Style Question

Figure 4b: TikTok Style - Contemporary social media language

Spanish Translation

Figure 4c: Spanish Mode - English to Spanish translation

The GRE mode enhances vocabulary learning by replacing common words with their more sophisticated alternatives (e.g., “good” becomes “exemplary”), while TikTok style transforms formal English into contemporary social media expressions (e.g., “That’s cool” becomes “That’s bussin fr fr”). These AI-powered transformations maintain the original meaning while adapting to different language registers.

Research Questions and Hypotheses

After 10 months of operation and 93 users, this analysis investigates three key aspects of user behavior:

  1. Usage Context and Platform Patterns
    • Research Question: “What are the common contexts and platforms where users engage with Jargon?”
    • Hypothesis: Users primarily engage with Jargon on social media or entertainment sites and banned academic sites.
    • Rationale: Understanding where users naturally integrate Jargon into their browsing can inform platform-specific optimization and marketing strategies.
  2. Feature Adoption and User Success
    • Research Question: “What features and settings distinguish active users from occasional users?”
    • Hypothesis: Active users utilize more customization options (density settings, highlight styles) and has achievable daily goals.
    • Rationale: Identifying the features that correlate with sustained engagement can guide onboarding improvements and feature prioritization.

Methods

Data Collection

The data for this analysis was collected from Jargon’s Supabase database, covering user interactions from the extension’s launch in June 2024 through March 16, 2025. The dataset comprises five main tables:

Table 1: Overview of Dataset Components
Dataset Records Description
Profiles 92 User profiles and settings
Questions 2442 Generated practice questions
Words 1594 Vocabulary entries and translations
Levels 117 User progression through difficulty levels
Websites 27 Websites where extension was disabled

Dataset Descriptions

1. Profiles Dataset

Table 2: Key Variables in Profiles Dataset
Variable Type Description Notes
user_id Primary Key Unique identifier for each user Anonymized identifier
level Integer Current proficiency level Range: 1-10
paused Boolean Extension status on Chrome TRUE/FALSE (Default: TRUE)
chrome_notifs Boolean Notification preferences TRUE/FALSE
language String Current selected language mode e.g., ‘GRE Vocabulary’, ‘TikTok Slang’
last_question_time DateTime Timestamp of most recent question UTC timezone
week_streak Integer Consecutive weeks of activity
daily_streak Integer Consecutive days of activity
daily_progress Integer Questions completed today Resets daily
daily_goal Integer Target questions per day User-set goal
density Integer Frequency of questions Percentage of eligible sentences shown (0-100)
highlightStyle String Text selection preference ‘highlight’ or ‘underline’

2. Questions Dataset

Table 3: Key Variables in Questions Dataset
Variable Type Description Notes
question_id Primary Key Unique question identifier
user_id Foreign Key Associated user References profiles
created_at DateTime Question generation time UTC timezone
sentence Text Original selected text English source content
word String Target word for learning
language String Transformation mode Selected language mode
original_sentence Text Source text Pre-transformation content
options_array Array of String Multiple choice options Even indices: options in target language; Odd indices: English translations
answered_at DateTime Completion timestamp NULL if unanswered
chosen_option String User’s answer NULL if unanswered
user_rating Integer Question quality rating Feature not yet implemented

3. Words Dataset

Table 4: Key Variables in Words Dataset
Variable Type Description Notes
created_at DateTime Word entry timestamp UTC timezone
word String Target vocabulary
language String Language mode
user_id Foreign Key Associated user References profiles
translation Text English translation AI-generated translation
status String Learning status Currently all set to ‘learning’

4. Levels Dataset

Table 5: Key Variables in Levels Dataset
Variable Type Description Notes
user_id Foreign Key Associated user References profiles
language String Language mode
level Integer Difficulty level Range: 1-10

5. Websites Dataset

Table 6: Key Variables in Websites Dataset
Variable Type Description Notes
user_id Foreign Key Associated user References profiles
website String Blocked URL Sites where Jargon is disabled

Data Processing

Data Cleaning Steps

Profile Enhancement

  • Aggregated user activity metrics from various tables
  • Created derived engagement metrics:
    • Total questions generated
    • Questions answered
    • Number of blocked websites
    • Unique difficulty levels attempted
  • Handled missing values by replacing NAs with 0 for count-based metrics

Derived Variables

Table 7: Overview of Derived Variables
Variable Calculation Purpose
generated_questions Count of questions per user Measure overall engagement
answered_questions Count of questions with answers Measure learning completion
blocked_sites Count of blocked websites Understand avoidance patterns
levels_attempted Count of unique combination of languages and difficulty levels Track learning progression

Analysis Methods

This analysis adopts a sequential approach to address each research question independently, allowing for focused exploration and detailed insights. We first investigate the usage context and platform patterns by analyzing website blocking behavior and user interaction patterns. Following this comprehensive examination of the first question, we then explore feature adoption patterns and their relationship to user success, focusing on how different customization choices correlate with engagement levels.

Data Exploration

Our exploratory data analysis examines patterns that inform both research questions about usage context and feature adoption. We organize our exploration into three main categories:

1. Platform and Website Interaction Patterns

[Relevant to Research Question 1: Usage Context and Platform Patterns]

Figure 5: Website Usage Analysis - Distribution of blocked websites by category (left) and frequency of individual websites (right)

The analysis of blocked websites reveals distinct patterns in how users interact with the Jargon extension. Professional tools, particularly Salesforce and AI platforms, emerge as the most frequently blocked categories, suggesting that users primarily utilize Jargon during work-related activities. The presence of development environment blocks in the dataset indicates that the user base includes some technical professionals, though this represents a modest portion of the overall usage. Educational content also features prominently in the blocked websites, with users frequently disabling the extension on documentation sites and educational platforms, possibly to maintain focus during concentrated learning sessions. Interestingly, social media platforms show lower blocking rates than initially hypothesized, with users selectively choosing which major platforms to exclude from Jargon’s functionality, rather than implementing broad-based social media blocks.

2. Language Mode and Feature Usage

[Relevant to Both Research Questions]

Figure 6: Scatter plot showing the relationship between user adoption and question generation across different language modes

The scatter plot reveals several key insights about language mode usage patterns:

  1. Language Mode Popularity:
    • Spanish emerges as the most active language mode, with both the highest number of questions generated (~800) and a substantial user base (~30 users)
    • GlizzyTalk and Tamil show moderate adoption, each generating around 300 questions
    • Korean and GRE Vocabulary form a middle tier with similar question counts (~200)
  2. Usage Intensity Patterns:
    • A clear positive correlation exists between the number of unique users and questions generated
    • However, some language modes (like Tamil) show high question generation despite fewer users, suggesting intense usage by dedicated learners
    • Mandarin Chinese shows moderate activity with relatively few users, indicating focused learning by a small group
  3. User Adoption Tiers:
    • High adoption (>20 users): Spanish
    • Medium adoption (10-20 users): Tamil, GlizzyTalk, Mandarin Chinese
    • Low adoption (<10 users): Most other languages including French, German, and specialized modes like SAT Vocabulary
  4. Engagement Distribution:
    • Traditional language learning (Spanish, Tamil, Korean) generally shows higher engagement than specialized English modes (GRE, SAT Vocabulary)
    • Some newer languages (Croatian, Bulgarian, Urdu) show minimal activity, suggesting potential for growth or need for better promotion

These patterns suggest that while traditional language learning drives most user activity, there’s significant variation in how different language modes are utilized, with some showing intense usage by small groups while others have broader but less intensive adoption.

Figure 7: Word frequency analysis showing common words (left) and word pairs (right) in learning content. Colors indicate frequency of occurrence, with darker shades representing higher frequencies.

The word frequency analysis reveals patterns in user-selected content:

  1. Single Word Patterns:
    • Technical terms (e.g., “particle”, “incremental”) appear frequently
    • Action words (e.g., “grows”, “forms”) show dynamic content selection
    • Descriptive terms (e.g., “gentle”, “floating”) indicate varied context
  2. Word Pair Patterns:
    • Scientific combinations (e.g., “concentric layers”, “stiff breeze”)
    • Process descriptions (e.g., “gentle churn”, “ice form”)
    • Movement patterns (e.g., “ball rolls”, “floating particle”)

These patterns suggest users often engage with technical and descriptive content, particularly in scientific or educational contexts.

3. Temporal and Engagement Patterns

[Relevant to Both Research Questions]

Figure 8: Daily activity patterns showing question generation and active users with their respective averages (dashed lines) over the observation period, based on UTC timezone. Questions average: 12.5 per day; Users average: 2.2 per day.

Figure 9: Weekly activity patterns showing average questions generated and active users by day of week (UTC timezone), with error bars indicating standard error.

The temporal analysis reveals several key patterns (Note: All timestamps are in UTC, which may shift actual usage patterns by several hours depending on users’ local time zones):

  1. Daily Timeline Trends (Figure 8):
    • Daily questions show significant fluctuations, with peaks reaching up to 200 questions
    • The platform averages 12.5 questions per day
    • Active users average 2.2 per day, typically ranging from 1-5
    • Due to UTC recording, activity peaks may appear shifted from users’ actual local time
  2. Weekly Cycle Patterns (Figure 9):
    • Box plots reveal the full distribution of activity across each day of the week
    • Median activity levels show day-to-day variations, with interpretation requiring UTC timezone consideration
    • Outliers indicate occasional high-activity days across all weekdays
    • The spread of the boxes shows varying consistency in usage patterns across different days
    • Some days show wider interquartile ranges, suggesting more variable activity levels
  3. Growth and Engagement:
    • Overall growth trends are evident despite daily variations
    • User base shows gradual expansion with periodic adoption spikes
    • Both casual and intensive usage patterns are observed
    • Timezone-independent metrics like daily totals and weekly averages provide reliable growth indicators

Note: Future analysis would benefit from timezone-adjusted data to more accurately reflect users’ local activity patterns.

4. User Engagement Distribution

[Relevant to Research Question 2: Feature Adoption and User Success]

Figure 10: Distribution of key engagement metrics across users, showing individual box plots for each metric with median and interquartile range (IQR) statistics. Each plot uses a distinct color and includes summary statistics.

The distribution of engagement metrics reveals distinct patterns in user behavior:

  1. Question Generation and Completion:
    • Generated questions show a median of 5.5 with substantial right-skew
    • Answered questions closely track generation, with a median of 5.5
    • Both metrics show several high-activity outliers, indicating power users
  2. Website Blocking Behavior:
    • Users typically block 0 sites (median)
    • The narrow IQR of 0 sites suggests consistent blocking patterns
    • Few users block more than 4 sites
  3. Level Progression:
    • Users typically attempt 1 levels/languages (median)
    • The compact IQR of 0 levels indicates focused learning
    • A small number of users explore 4+ different levels

These patterns suggest a typical engagement profile of moderate, focused activity with a distinct subset of highly engaged users.

Results

Research Question 1: Usage Context and Platform Patterns

Website Usage Analysis

[Current website analysis content]

Platform Usage Patterns

[To be developed]

Research Question 2: Feature Adoption and User Success

Feature Correlation Analysis

Figure 11: Correlation matrix of user engagement features. Circle size and color intensity indicate correlation strength, with blue showing positive correlations and red showing negative correlations.

The correlation analysis reveals several key relationships between feature adoption and user engagement:

  1. Display Preferences and Engagement:
    • Users who prefer the highlight style (vs. underline) show significantly higher engagement
    • Strong positive correlations with both question generation (0.67) and completion (0.59)
    • This suggests the highlight feature may be more effective for sustained learning
  2. Feature Usage Patterns:
    • Very strong correlation (0.92) between question generation and completion rates
    • Users who block more sites tend to be more active (0.41 correlation with answered questions)
    • Daily goal setting shows minimal impact on actual usage patterns
  3. Learning Progression Indicators:
    • Strong positive correlations between levels attempted and question metrics (0.71, 0.64)
    • Users exploring multiple levels/languages show higher overall engagement
    • This suggests successful users tend to diversify their learning experience
  4. Feature Customization Impact:
    • Positive correlations between customization features (highlight style, blocked sites)
    • Users who customize their experience show higher engagement levels
    • This indicates the importance of personalization options

These findings provide strong evidence that certain feature combinations and usage patterns are associated with higher engagement levels, particularly: - The choice of highlight style over underlining - Active customization of the learning environment - Exploration of multiple language levels - Consistent completion of generated questions

[Additional sections on feature adoption patterns…]

Conclusions and Summary

Key Findings

  1. Usage Context and Platform Patterns
    • [Summary of actual findings from your data about platform usage]
    • [Insights about language mode preferences]
  2. Language Learning Behavior
    • [Summary of findings about language mode usage]
    • [Patterns in language mode switching]
  3. Feature Adoption and User Success
    • [Summary of findings about feature usage]
    • [Patterns in feature adoption]

Limitations

  1. Sample Size
    • Limited user base (93 users) affects statistical power
    • Early adoption phase may not represent typical usage
  2. Time Frame
    • 10-month observation period may not capture long-term learning patterns
    • Launch period effects may influence usage patterns

Future Directions

  1. Product Development
    • [Recommendations based on actual findings]
    • [Suggested improvements supported by data]
  2. Research Extensions
    • Longer-term user tracking needed
    • Investigation of specific feature impacts on retention